Skip to content

Conversation

@Jzjsnow
Copy link
Contributor

@Jzjsnow Jzjsnow commented Sep 9, 2025

Why are the changes needed?

Close #3775.

Brief change log

Add support for MSE based refresh event:

  • Support for calculating partition filesize mean square error based on the loaded metadata.
  • Filter partitions need to be optimized based on threshold and trigger pendingInput evaluation if necessary.

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before making a pull request

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@Jzjsnow Jzjsnow force-pushed the add_support_for_pluggable_refresh_event branch 2 times, most recently from cd57764 to c8734fb Compare September 29, 2025 09:11
@Jzjsnow Jzjsnow changed the title [AMORO-3775] Add support for pluggable refresh event in TableRuntimeRefreshExecutor [AMORO-3775] Add support for metric-based refresh event trigger in TableRuntimeRefreshExecutor Sep 29, 2025
@Jzjsnow Jzjsnow force-pushed the add_support_for_pluggable_refresh_event branch 2 times, most recently from c352653 to f00825b Compare September 29, 2025 09:41
@xxubai
Copy link
Contributor

xxubai commented Oct 27, 2025

Can we move forward with this feature now? @Jzjsnow @klion26

@Jzjsnow Jzjsnow force-pushed the add_support_for_pluggable_refresh_event branch 2 times, most recently from 7782ac7 to ab4b971 Compare October 31, 2025 07:47
@Jzjsnow Jzjsnow force-pushed the add_support_for_pluggable_refresh_event branch from ab4b971 to 519d183 Compare October 31, 2025 07:50
@Jzjsnow
Copy link
Contributor Author

Jzjsnow commented Oct 31, 2025

Can we move forward with this feature now? @Jzjsnow @klion26

Sure, I've updated the branch and added the new evaluation criteria discussed earlier (see Step 1 for details).

The current conditions for triggering pendingInput evaluation based on metrics are as follows:
Step 1: If the condition delete file=0 && avg file size > target size * ratio is met, the evaluation is considered unnecessary and will be skipped.
Step 2: Calculate detailed attributes for each partition in the table, including the sum of squared errors for file sizes. If this exceeds the file size tolerance threshold, the pendingInput requires evaluation.

Note that this update now supports MIX_ICEBERG tables, whereas previously only ICEBERG format was supported.

Please take a look when you are free. Looking forward to your feedback! @xxubai @zhoujinsong @klion26

"self-optimizing.evaluation.average-file-size.tolerance"; // the minimum tolerance value for
// the average
// partition file size (between 0 and (self-optimizing.target-size))
public static final MemorySize SELF_OPTIMIZING_EVALUATION_AVERAGE_FILE_SIZE_TOLERANCE_DEFAULT =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use byte size to unify the file size unit?


CloseableIterable<PartitionFileBaseInfo> tableFiles =
getTableFilesInternal(amoroTable, null, null);
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use try catch with resource to close the io automaticly

Suggested change
try {
try (CloseableIterable<PartitionFileBaseInfo> tableFiles
= getTableFilesInternal(amoroTable, null, null)) {
for (PartitionFileBaseInfo fileInfo : tableFiles) {
refreshPartitionBasicInfo(fileInfo, partitionBaseInfoHashMap);
}
} catch (IOException e) {
LOG.warn("Failed to close the manifest reader.", e);
}

MixedTable table, long minTargetSize) {
Map<String, PartitionBaseInfo> partitionBaseInfoHashMap = new HashMap<>();
CloseableIterable<PartitionFileBaseInfo> tableFiles = getTableFilesInternal(table, null, null);
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simply use a try-with-resources statement.

return true;
}

ExecutorService executorService = ThreadPools.getWorkerPool();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a dedicated thread pool to avoid thread congestion.

Suggested change
ExecutorService executorService = ThreadPools.getWorkerPool();
ExecutorService executorService = IcebergThreadPools.getPlanningExecutor();

return getTableFilesInternal(mixedTable, partition, specId);
}

private CloseableIterable<PartitionFileBaseInfo> getTableFilesInternal(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, after the event-triggered evaluation, a full table scan is performed to collect partition information, which can be very expensive (especially for large tables with hundreds of thousands of files). Perhaps we can optimize this part when upgrading the Iceberg version and introducing PartitionStatistics.

package org.apache.amoro.table;

/** Detailed table partition properties list. */
public class TablePartitionDetailProperties {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simply this name? such as PartitionSummaryProperties

&& lastOptimizedSnapshotId != defaultTableRuntime.getCurrentSnapshotId())) {
tryEvaluatingPendingInput(defaultTableRuntime, mixedTable);
if (!defaultTableRuntime.getOptimizingConfig().isEventBasedTriggerEnabled()
|| MetricBasedRefreshEvent.isEvaluatingPendingInputNecessary(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this cause an additional full table scan compared to before?
In addition, we should also check whether optimization is enabled, so it would be better to combine this with tryEvaluatingPendingInput to avoid extra overhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Subtask]: Add support for Metadata Metric-Driven refresh event

2 participants